home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
IRIX 6.2 Development Libraries
/
SGI IRIX 6.2 Development Libraries.iso
/
dist
/
complib.idb
/
usr
/
share
/
catman
/
p_man
/
cat3
/
complib
/
blas.z
/
blas
Wrap
Text File
|
1996-03-14
|
13KB
|
331 lines
BBBBLLLLAAAASSSS((((3333FFFF)))) BBBBLLLLAAAASSSS((((3333FFFF))))
NNNNAAAAMMMMEEEE
BLAS, libblas - Basic Linear Algebra Subprograms
DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN
BLAS is a library of routines that perform basic operations involving
matrices and vectors. They were designed as a way of acheiving efficiency
in the solution of linear algebra problems. The BLAS, as they are now
commonly called, have been very successful and have been used in a wide
range of software, including LINPACK, LAPACK and many of the algorithms
published by the ACM Transactions on Mathematical Software. They are an
aid to clarity, portability, modularity and maintenance of software, and
have become the de facto standard for elementary vector and matrix
operations.
The BLAS promote modularity by identifying frequently ocurring operations
of linear algabra and by specifying a standard interface to these
operations. Efficiency is achieved through optimization within the BLAS
without altering the higher-level code that has referenced them.
There are three levels of BLAS. The original set of BLAS, commonly
refered as the Level 1 BLAS, perform low-level operations such as dot-
product and the adding of a multiple of one vector to another. Typically
these operations involve O(N) floating point operations and O(N) data
items moved (loaded or stored), where N is the length of the vectors. The
Level 1 BLAS permit efficient implementation on scalar machines, but the
ratio of floating-point operations to data movement is too low to achieve
effective use of most vector or parallel hardware.
The Level 2 BLAS perform Matrix-Vector operations that occur frequently
in the implementation of mant of the most common linear algebra
algorithms. They involve O(N^2) floating point operations. Algorithms
that use Level 2 BLAS can be very efficient on vector computers, but are
not well suited to computers with a hierarchy of memory (such as cache
memory).
The Level 3 BLAS are targeted at matrix-matrix operations. These
operations generally involve O(N^3) floating point operations, while only
creating O(N^2) data movement. These operations permit efficient reuse of
data that resides in cache and create waht is often called the surface-
to-volumne effect for the ratio of computations to data movement. In
addition, matrices can be partitioned into blocks, and operations on
distinct blocks can be performed in parallel, and within the operations
on each block, scalar or vector operations may be performed in parallel.
BLAS2 and BLAS3 modules have been optimized and parallelized to take
advantage of SGI's RISC parallel architecture. The best performances are
achieved for BLAS3 routines (e.g. DGEMM), where "outer-loop" unrolling +
"blocking" techniques were applied to take advantage of the memory cache.
The performance of BLAS2 routines (e.g. DGEMV) is sensitive to the size
of the problem, for large sizes the high rate of cache miss slows down
the algorithms.
LAPACK algorithms use preferably BLAS3 modules and are the most
PPPPaaaaggggeeee 1111
BBBBLLLLAAAASSSS((((3333FFFF)))) BBBBLLLLAAAASSSS((((3333FFFF))))
efficient. LINPACK uses only BLAS1 modules and therefore is less
efficient than LAPACK.
To link with "libblas", it is advised to use "f77" to load all the
Fortran Libraries required. For Power Challenge, you should use the mips4
version. This is accomplished by using ----mmmmiiiippppssss4444 when linking:
ffff77777777 ----mmmmiiiippppssss4444 -o foobar.out foo.o bar.o ----llllbbbbllllaaaassss
To use the parallelized version, use
ffff77777777 ----mmmmiiiippppssss4444 -o foobar.out foo.o bar.o ----llllbbbbllllaaaassss____mmmmpppp
SSSSUUUUMMMMMMMMAAAARRRRYYYY
BBBBLLLLAAAASSSS LLLLeeeevvvveeeellll 1111::::
.....function...... ....prefix,suffix..... rootname
dot product s- d- c-u c-c z-u z-c -dot-
y = a*x + y s- d- c- z- -axpy
setup Givens rotation s- d- -rotg
apply Givens rotation s- d- cs- zd- -rot
copy x into y s- d- c- z- -copy
swap x and y s- d- c- z- -swap
Euclidean norm s- d- sc- dz- -nrm2
sum of absolute values s- d- sc- dz- -asum
x = a*x s- d- cs- c- zd- z- -scal
index of max abs value is- id- ic- iz- -amax
BBBBLLLLAAAASSSS LLLLeeeevvvveeeellll 2222::::
MV Matrix vector multiply
R Rank one update to a matrix
R2 Rank two update to a matrix
SV Solving certain triangular matrix problems.
single precision Level 2 BLAS | Double precision Level 2 BLAS
-----------------------------------------------------------------------
MV R R2 SV | MV R R2 SV
SGE x x | DGE x x
SGB x | DGB x
SSP x x x | DSP x x x
SSY x x x | DSY x x x
SSB x | DSB x
STR x x | DTR x x
STB x x | DTB x x
STP x x | DTP x x
complex Level 2 BLAS | Double precision complex Level 2 BLAS
-----------------------------------------------------------------------
MV R RC RU R2 SV| MV R RC RU R2 SV
CGE x x x | ZGE x x x
CGB x | ZGB x
CHE x x x | ZHE x x x
CHP x x x | ZHP x x x
CHB x | ZHB x
CTR x x | ZTR x x
PPPPaaaaggggeeee 2222
BBBBLLLLAAAASSSS((((3333FFFF)))) BBBBLLLLAAAASSSS((((3333FFFF))))
CTB x x | ZTB x x
CTP x x | ZTP x x
BBBBLLLLAAAASSSS LLLLeeeevvvveeeellll 3333::::
MM Matrix matrix multiply
RK Rank-k update to a matrix
R2K Rank-2k update to a matrix
SM Solving triangular matrix with many right-hand-sides.
single precision Level 3 BLAS | Double precision Level 3 BLAS
-----------------------------------------------------------------------
MM RK R2K SM | MM RK R2K SM
SGE x | DGE x
SSY x x x | DSY x x x
STR x x | DTR x x
complex Level 3 BLAS | Double precision complex Level 3 BLAS
-----------------------------------------------------------------------
MM RK R2K SM | MM RK R2K SM
CGE x | ZGE x
CSY x x x | ZSY x x x
CHE x x x | ZHE x x x
CTR x x | ZTR x x
CCCC IIIINNNNTTTTEEEERRRRFFFFAAAACCCCEEEE
There is a C interface for the BLAS library. The implementation is based
on the proposed specification for BLAS routines in C [1].
The argument lists follow closely the equivalent Fortran ones. The main
changes being that enumeration types are used instead of character types
for option specification, and two dimensional arrays are stored in one
dimensional C arrays in an analogous fashion as a Fortran array (column
major). Therfore, a matrix A would be stored as:
double (*a)[lda*n];
/* */
/* aaaa is a pointer to an array of size ttttddddaaaa****nnnn */
/* */
where element A(i+1,j) of matrix A is stored immediately after the
element A(i,j), while A(i,j+1) is lda elements apart from A(i,j). The
element A(i,j) of the matrix can be accessed directly by reference to a[
(j-1)*lda + (i-1) ].
The names of the C versions of the BLAS are the same as the Fortran
versions since the compiler puts the Fortran names in upper case and adds
an underscore after the name.
The argument lists use the following data types:
Integer: an integer data type of 32 bits.
float: the regular single precision floating-point type.
PPPPaaaaggggeeee 3333
BBBBLLLLAAAASSSS((((3333FFFF)))) BBBBLLLLAAAASSSS((((3333FFFF))))
double: the regular double precision floating-point type.
Complex: a single precision complex type.
Zomplex: a double precision complex type.
plus the enumeration types given by
typedef enum { NoTranspose, Transpose, ConjugateTranspose }
MatrixTranspose;
typedef enum { UpperTriangle, LowerTriangle }
MatrixTriangle;
typedef enum { UnitTriangular, NotUnitTriangular }
MatrixUnitTriangular;
typedef enum { LeftSide, RightSide }
OperationSide;
The complex data types are stored in cartisian form, i.e., as real and
imaginary parts. For example
typedef struct { float real;
float imag;
} Complex;
typedef struct { double real;
double imag;
} Zomplex;
The operations performed by the C BLAS are identical to those performed
by the corresponding Fortran BLAS, as specified in [2], [3] and [4].
To use the C BLAS, link with "libblas". It is advised to use "f77" to
load all the Fortran Libraries required:
ffff77777777 -o foobar.out foo.o bar.o ----llllbbbbllllaaaassss
FFFFIIIILLLLEEEESSSS
/usr/lib/libblas.a
/usr/lib/libblas_mp.a
/usr/include/cblas.h
OOOORRRRIIIIGGGGIIIINNNN
The original Fortran source code comes from netlib.
RRRREEEEFFFFEEEERRRREEEENNNNCCCCEEEESSSS
S.P. Datardina, J.J. Du Croz, S.J. Hammrling and M.W. Pont, "A Proposed
Specification of BLAS Routines in C", NAG Technical Report TR6/90.
C Lawson, R. Hanson, D. Kincaid, and F. Krough, "Basic Linear Algebra
Subprograms for Fortran usage ", ACM Trans. on Math. Soft. 5(1979)
308-325
PPPPaaaaggggeeee 4444
BBBBLLLLAAAASSSS((((3333FFFF)))) BBBBLLLLAAAASSSS((((3333FFFF))))
J.Dongarra, J.DuCroz, S.Hammarling, and R.Hanson, "An extended set of
Fortran Basic Linear Algebra Subprograms", ACM Trans. on Math. Soft. 14,
1(1988) 1-32
J.Dongarra, J.DuCroz, I.Duff,and S.Hammarling, "An set of level 3 Basic
Algebra Subprograms", ACM Trans on Math Soft( Dec 1989)
PPPPaaaaggggeeee 5555